In this lab, we will explore some of R’s functionality with spatial
data, with special attention given to the sf package.
For more information about sf, you can visit their website. Robin Lovelace’s
book Geocomputation with R (available online) is also a really
helpful and educational source for learning sf. In
fact, much of this lab is just an abbreviated version of that book. For
some of the longer examples, it is highly recommended to use R’s
scripting functions to run the examples to save on re-typing.
Before starting, remember to create a working directory
(e.g. module12). Next download the files for today’s lab
from the Canvas page and move them to this directory. You’ll need the
following files:
Make sure to unzip the zip files so that R can access the content. Note that on Windows, you will need to right-click on the file and select ‘Extract files’.
Base R has no structure for spatial data, so you will need to install the following packages (you should have some of these from previous modules):
library(ggplot2)
library(terra)
library(RColorBrewer)
library(sf)
library(viridis)
sfsf?sf is an R package designed to work with spatial data
organized as “simple features” (hence, ‘sf’). Mostly, it supersedes the
sp package (written by the same people), but it also
collapses a lot of other R packages into one. In fact, just a few years
ago, if you were to take this course, you would have loaded all of these
packages:
library(maptools)
library(rgdal)
library(rgeos)
library(sp)
Now, a simple
library(sf)
will suffice. The sf package is able to provide all the
functionality it does because it interfaces with three widely adopted
programming standards: PROJ, GDAL, and GEOS. These provide for
coordinate reference systems, reading and writing of spatial data, and
geometric operations, respectively, but more on this in a moment.
Note that all sf functions are prefixed with
st_ (a legacy of this R package’s origins in PostGIS, where
‘st’ means “spatial type”).
A simple feature is, in the words of the sf authors, “a
formal standard (ISO 19125-1:2004) that describes how objects in the
real world can be represented in computers, with emphasis on the spatial
geometry of these objects” (ref). In
other words, its structured data that provides information about a
location in space, including its shape.
The way that sf chooses to represent simple features in
R should be familiar to you because they are just fancy data.frames. To
demonstrate this, we’ll load a file of county outlines for North
Carolina (this file is included when you install the sf
package):
path_to_data <- system.file("shape/nc.shp", package="sf")
north_carolina <- st_read(path_to_data, quiet = TRUE)
north_carolina <- north_carolina[ , c("CNTY_ID", "NAME", "AREA", "PERIMETER")]
north_carolina
## Simple feature collection with 100 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
## First 10 features:
## CNTY_ID NAME AREA PERIMETER geometry
## 1 1825 Ashe 0.114 1.442 MULTIPOLYGON (((-81.47276 3...
## 2 1827 Alleghany 0.061 1.231 MULTIPOLYGON (((-81.23989 3...
## 3 1828 Surry 0.143 1.630 MULTIPOLYGON (((-80.45634 3...
## 4 1831 Currituck 0.070 2.968 MULTIPOLYGON (((-76.00897 3...
## 5 1832 Northampton 0.153 2.206 MULTIPOLYGON (((-77.21767 3...
## 6 1833 Hertford 0.097 1.670 MULTIPOLYGON (((-76.74506 3...
## 7 1834 Camden 0.062 1.547 MULTIPOLYGON (((-76.00897 3...
## 8 1835 Gates 0.091 1.284 MULTIPOLYGON (((-76.56251 3...
## 9 1836 Warren 0.118 1.421 MULTIPOLYGON (((-78.30876 3...
## 10 1837 Stokes 0.124 1.428 MULTIPOLYGON (((-80.02567 3...
You can summarize this somewhat verbose printout by noting that simple features fit a simple formula:
\[ sf = attributes + geometry + crs
\]
This formula also suggests the kinds of ways that you might interact
with an sf object by, for example, changing its crs, or
filtering based on its attributes (or geometry), or manipulating its
geometry.
Attributes are properties of a feature. In this case, the
features are counties in North Carolina, and their attributes are things
like name and area. In an sf data.frame, each
feature is a row, and each attribute is a column. In the
north_carolina object, for example, the first feature has
the name “Ashe” and its county ID is 1825.
A very special attribute column is called the geometry
(sometimes labeled ‘geom’ or ‘shape’). It consists of a point or set of
points (specifically, their coordinates) that define the shape and
location of the feature. The simple feature standard includes 17
geometry types, 7 of which are supported by sf: point,
multipoint, linestring, multilinestring, polygon, multipolygon, and
geometry collection.
As mentioned already, these geometries are just a series of points:
point_one <- st_point(c(0, 3))
point_two <- st_point(c(5, 7))
a_line <- st_linestring(c(point_one, point_two))
If you print these geometries
point_one
## POINT (0 3)
a_line
## LINESTRING (0 3, 5 7)
you see that they are represented as a text string. This is the Well Known Text (WKT) standard for specifying geometries. It tells us what kind of geometry the feature is and lists its x-y coordinates separated by commas.
If you want to know what geometry type your simple feature contains, try:
st_geometry_type(a_line)
## [1] LINESTRING
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
The final ingredient in a simple feature is its a spatial or coordinate reference system (CRS). A CRS provides two crucial pieces of information: (i) what rule we use to assign coordinates to points and (ii) what datum to use. It is not an exaggeration to say that the CRS is the most important element of a simple feature, for without a CRS, the numbers in the geometry column are just that, numbers, rather than full-blooded spatial coordinates.
Understanding what a coordinate assignment rule does is beyond the scope of this lab, but the datum deserves some attention. In effect, it specifies three things:
POINT (0 0),POINT (5 7) as being 5 meters east and seven
meters north of the origin, or - worse - 5 feet east
and 7 feet north, andAs with the geometries, the standard for representing CRS is WKT, though the easiest way to identify a CRS is to use its EPSG code. To find the EPSG code for a CRS, you can visit this website: spatialreference.org.
The most widely used CRS is the World Geodetic System 84 (WGS 84, a geographic system) whose EPSG code is 4326:
st_crs(4326)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
If you are familiar with the PROJ4-string syntax, you can retrieve that from a CRS with:
st_crs(4326)$proj4string
## [1] "+proj=longlat +datum=WGS84 +no_defs"
However, current open standards specified by PROJ and GDAL discourage the use of PROJ4-string syntax in favor of WKT, so it is probably best to get use to the latter now.
There’s actually one more element to a simple feature, but it is not as vital as the others and is really already implicit in the geometry. That is the bounding box. This is an object defined by the spatial extent of the data: the minimum and maximum x and y coordinates. You can retrieve the bounding box of a simple feature this way:
st_bbox(north_carolina)
## xmin ymin xmax ymax
## -84.32385 33.88199 -75.45698 36.58965
There are myriad uses for the bounding box, though we need not dwell on them here.
Reading and writing spatial data, it turns out, is quite the chore.
The solution sf relies on is to interface with GDAL, which
handles lots of different spatial data types (it’s kinda its whole
purpose). Currently supported (vector) spatial data types can be found
at GDAL.org.
Perhaps the most common spatial data type - because ESRI is a thing - is
the shapefile, which has a .shp file extension.
In sf, the function for reading in spatial data is
st_read. Here is the nitty-gritty and, perhaps, needlessly
verbose version first:
NY8 <- st_read(dsn = "./NY_data/NY8_utm18.shp",
layer = "NY8_utm18",
drivers = "ESRI Shapefile")
## options: ESRI Shapefile
## Reading layer `NY8_utm18' from data source
## `/Users/u0784726/Dropbox/Data/devtools/geog5680/12 Spatial data in R/NY_data/NY8_utm18.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 281 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 358241.9 ymin: 4649755 xmax: 480393.1 ymax: 4808545
## Projected CRS: WGS 84 / UTM zone 18N
dsn stands for “data source name” and specifies where
the data is coming from, whether a file directory, a database, or
something else. layer is the layer in the data source to be
read in. Finally, drivers tells GDAL what format the file
is in or what structure it has, so it knows how to correctly interpret
the file. All of this information is printed to the console when you
execute st_read.
In this case, we are using a simple ESRI shapefile, so the data
source and layer are basically the same thing. Furthermore,
sf is good at guessing the driver based on the file
extension, so the driver does not normally need to be specified. Hence,
we could just as well have written:
NY8 <- st_read("./NY_data/NY8_utm18.shp")
## Reading layer `NY8_utm18' from data source
## `/Users/u0784726/Dropbox/Data/devtools/geog5680/12 Spatial data in R/NY_data/NY8_utm18.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 281 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 358241.9 ymin: 4649755 xmax: 480393.1 ymax: 4808545
## Projected CRS: WGS 84 / UTM zone 18N
And here’s what this looks like:
Sometimes you have spatial data, but it is not in a spatial data format. Usually, this means you have a table or spreadsheet with columns for the x and y coordinates.
wna_climate <- read.csv("./WNAclimate.csv")
head(wna_climate)
## WNASEQ LONDD LATDD ELEV totsnopt annp Jan_Tmp Jul_Tmp
## 1 1 -107.9333 50.8736 686 78.7869 326 -13.9 18.8
## 2 2 -105.2670 55.2670 369 145.3526 499 -21.3 16.8
## 3 3 -102.5086 41.7214 1163 42.6544 450 -4.2 23.3
## 4 4 -110.2606 44.2986 2362 255.1009 489 -10.9 14.1
## 5 5 -114.1500 59.2500 880 164.8924 412 -23.9 14.4
## 6 6 -120.6667 57.4500 900 141.9260 451 -17.5 13.8
This can be converted to a simple feature using the
st_as_sf function like so:
wna_climate <- st_as_sf(wna_climate,
coords = c("LONDD", "LATDD"),
crs = 4326)
wna_climate
## Simple feature collection with 2012 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -138 ymin: 29.8333 xmax: -100 ymax: 60
## Geodetic CRS: WGS 84
## First 10 features:
## WNASEQ ELEV totsnopt annp Jan_Tmp Jul_Tmp geometry
## 1 1 686 78.7869 326 -13.9 18.8 POINT (-107.9333 50.8736)
## 2 2 369 145.3526 499 -21.3 16.8 POINT (-105.267 55.267)
## 3 3 1163 42.6544 450 -4.2 23.3 POINT (-102.5086 41.7214)
## 4 4 2362 255.1009 489 -10.9 14.1 POINT (-110.2606 44.2986)
## 5 5 880 164.8924 412 -23.9 14.4 POINT (-114.15 59.25)
## 6 6 900 141.9260 451 -17.5 13.8 POINT (-120.6667 57.45)
## 7 7 1100 183.4187 492 -18.8 13.2 POINT (-119.7167 56.7167)
## 8 8 1480 191.5657 606 -11.0 12.8 POINT (-114.6 50.7667)
## 9 9 651 144.9015 443 -22.4 15.3 POINT (-122.1667 59.5167)
## 10 10 725 151.0778 455 -23.5 15.4 POINT (-112.1 57.7667)
The function just needs to know what columns the x and y coordinates are in and what CRS they are specified in. And here’s what it looks like:
The sf function for writing simple features to disk is
st_write. It is almost an exact mirror of
st_read, but it also requires that you specify the simple
feature object in your R environment that you want to write to disk. If
the layer already exists, you will need to specify
delete_layer = TRUE.
st_write(obj = wna_climate,
dsn = "./wnaclim.shp",
layer = "wnaclim",
drivers = "ESRI Shapefile")
or, more simply:
st_write(wna_climate, dsn = "./wnaclim.shp")
The cardinal rule for working with any spatial data is to make sure all of it is in the same CRS. This ensures that any analysis which combines multiple sources is correctly comparing values at the same locations. Never ever ever ever do anything with your data until you are sure you’ve got the CRS right.
The st_crs() function allows you to quickly check the
CRS for any object.
st_crs(NY8)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 18N
## wkt:
## PROJCRS["WGS 84 / UTM zone 18N",
## BASEGEOGCRS["WGS 84",
## DATUM["D_unknown",
## ELLIPSOID["WGS84",6378137,298.257223563,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]]],
## CONVERSION["UTM zone 18N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-75,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]],
## ID["EPSG",16018]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
You can also check the EPSG code (if specified):
st_crs(NY8)$epsg
## [1] NA
st_crs(wna_climate)$epsg
## [1] 4326
And you can get the name of a CRS this way:
format(st_crs(NY8))
## [1] "WGS 84 / UTM zone 18N"
There are two methods to set the CRS for a spatial object:
st_crs<- and st_set_crs.
wna_climate <- st_set_crs(wna_climate, 4326)
st_crs(wna_climate) <- 4326
# st_crs(wna_climate)
Note: this should only be used when the simple feature is missing a CRS and you know what it is. It is NOT for re-projecting the sf object to a new coordinate system.
The st_transform() function allows you to project your
sf object to a new CRS. This is particularly useful if you have multiple
data sources with different original coordinate systems.
# st_crs(NY8)
NY8 <- st_transform(NY8, crs = 4326)
# st_crs(NY8)
As a reminder: when you read in spatial data, the first thing you
should use is st_crs to check the CRS and
st_transform to re-project if necessary.
The attribute part of a sf object is a data.frame, so
you can use all the methods we have previously looked at for data
manipulation in working with attributes.
oregon_tann <- read_sf("./oregon/oregontann.shp")
class(oregon_tann)
## [1] "sf" "tbl_df" "tbl" "data.frame"
If you enter the name of an sf object, it will print the
first few rows of the attribute table:
oregon_tann
## Simple feature collection with 92 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -124.567 ymin: 42.05 xmax: -116.967 ymax: 46.15
## CRS: NA
## # A tibble: 92 × 5
## elevation tann coords_x1 coords_x2 geometry
## <int> <dbl> <dbl> <dbl> <POINT>
## 1 846 9.6 -121. 44.9 (-120.717 44.917)
## 2 96 12.5 -120. 45.7 (-120.2 45.717)
## 3 543 11.1 -123. 42.2 (-122.717 42.217)
## 4 2 10.3 -124. 46.2 (-123.883 46.15)
## 5 1027 7.6 -118. 44.8 (-117.817 44.833)
## 6 1050 8.4 -118. 44.8 (-117.833 44.783)
## 7 24 10.8 -124. 43.1 (-124.383 43.117)
## 8 1097 7.7 -121. 44.1 (-121.317 44.067)
## 9 999 9.5 -118. 43.9 (-118.167 43.917)
## 10 18 11.2 -122. 45.6 (-121.95 45.633)
## # ℹ 82 more rows
# get elevation and tann columns
# method 1
oregon_tann2 <- oregon_tann[ , c("elevation", "tann")]
# method 2
oregon_tann2 <- subset(oregon_tann, select = c(elevation, tann))
names(oregon_tann)
## [1] "elevation" "tann" "coords_x1" "coords_x2" "geometry"
names(oregon_tann2)
## [1] "elevation" "tann" "geometry"
Notice this very important difference between regular data.frames and
sf data.frames: when you subset by columns, even though you
do not explicitly state that you want to keep the geometry column, it
keeps that column anyway. In this sense, the geometry column is said to
be “sticky.”
Subsetting the data by rows works in the same way as before. So we
can carry out conditional selection of locations by using the usual
comparison operators (<, <=, ==, !=, >=, >).
For example, to select only the points above 1000 m elevation in the
Oregon data set:
# get features above 1000 meters
# method 1
oregon_tann3 <- oregon_tann[oregon_tann$elevation > 1000, ]
# method 2
oregon_tann3 <- subset(oregon_tann, subset = elevation > 1000)
New variables can easily be appended to an existing sf
object using the following notation:
# method 1
oregon_tann$rando <- runif(n = nrow(oregon_tann))
# method 2
oregon_tann[, "rando"] <- runif(n = nrow(oregon_tann))
names(oregon_tann)
## [1] "elevation" "tann" "coords_x1" "coords_x2" "geometry" "rando"
If you need to extract any variable from a sf object to
a standard R vector, you can again use the standard notation. Note that
if you use [,] to specify the columns, you need to add
drop = TRUE to remove the geometry:
# method 1
elevaton <- oregon_tann$elevation
# method 2
# if you don't specify drop = TRUE, it'll keep the sticky geometry column
elevation <- oregon_tann[ , "elevation", drop = TRUE]
elevation[1:10]
## [1] 846 96 543 2 1027 1050 24 1097 999 18
If you need only the geometry (the set of coordinates, or vector definitions), these can be extracted as follows:
# method 1
geometry <- st_geometry(oregon_tann)
# method 2
geometry <- oregon_tann$geometry
geometry
## Geometry set for 92 features
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -124.567 ymin: 42.05 xmax: -116.967 ymax: 46.15
## CRS: NA
## First 5 geometries:
## POINT (-120.717 44.917)
## POINT (-120.2 45.717)
## POINT (-122.717 42.217)
## POINT (-123.883 46.15)
## POINT (-117.817 44.833)
In case you just want the attributes, not the geometry:
attributes <- st_drop_geometry(oregon_tann)
head(attributes)
## # A tibble: 6 × 5
## elevation tann coords_x1 coords_x2 rando
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 846 9.6 -121. 44.9 0.969
## 2 96 12.5 -120. 45.7 0.706
## 3 543 11.1 -123. 42.2 0.444
## 4 2 10.3 -124. 46.2 0.493
## 5 1027 7.6 -118. 44.8 0.591
## 6 1050 8.4 -118. 44.8 0.554
Note: this is actually a special sort of data.frame
called a tibble. Not important to know about here, but does
print slightly differently.
Spatial operations are like attribute operations, but they work with the geometry column rather than the attributes. There are loads of these functions, but will just review some of the more important ones here.
This is probably the biggest one. Basically, you are taking one
geometry and using it to filter other geometries. To demonstrate this,
first we’ll make some random points in the north_carolina
simple feature. Well, first-first, we need to project the simple
features, since sf will protest if you try to do spatial
operations on longitude and latitude, as several of these methods
require the calculation of distances between locations.
north_carolina <- st_transform(north_carolina, crs = 26918)
Now use st_sample to generate the random points:
set.seed(1234)
random_pnts <- st_sample(north_carolina, size = 500)
random_pnts <- st_as_sf(random_pnts)
Now, we can use one geometry to filter out a second one. To obtain
just the points in, say, Pasquotank County, we first subset the North
Carolina sf object to extract only this county:
pasquotank <- subset(north_carolina, NAME == "Pasquotank")
Then use the st_filter() function with the county
polygon to get only the points located in that polygon:
filtered_pnts <- st_filter(random_pnts, pasquotank)
Now, you know where Pasquotank County is!
Internally, st_filter assumes a “topological” or spatial
relationship defined by what the sf authors refer to as
spatial predicate (.predicate). By default,
st_intersects works to find the geometry of one object
located within another. We can, however, specify other spatial
relationships. For example, to get all the points outside
Pasquotank:
filtered_pnts <- st_filter(random_pnts, pasquotank, .predicate = st_disjoint)
Another useful predicate is st_is_within_distance, which
requires that you pass an additional distance (dist)
argument to the filter. The dist argument is in units
specified by the CRS, in this case meters.
filtered_pnts <- st_filter(random_pnts,
pasquotank,
.predicate = st_is_within_distance,
dist = 50000)
With spatial operations, the geometry is preserved (mostly). With geometric operations, the whole point is to manipulate the geometry. Again, we are just going to hit the highlights. It is worth emphasizing that these operations will often behave differently depending on the geometry type.
the_heart_of_pasquotank <- st_centroid(pasquotank)
## Warning: st_centroid assumes attributes are constant over geometries
the_heft_of_pasquotank <- st_buffer(pasquotank, dist = 50000)
This one merges geometries and dissolves interior borders when applied to polygons.
north_carolina_boundary <- st_union(north_carolina)
To cast a geometry is to change it from one geometry type to another. For example, to convert the boundary of North Carolina to points (the vertices of the polygon):
north_carolina_points <- st_cast(north_carolina_boundary, "POINT")
If we convert to a LINESTRING object, this acts to
separate out the individual polygons:
north_carolina_lines <- st_cast(north_carolina_boundary, "MULTILINESTRING")
north_carolina_lines <- st_cast(north_carolina_lines, "LINESTRING")
If you can’t tell, it was broken into six lines: one for the mainland, and the other five for the ecological (and cultural) disaster known as the Outer Banks.
graphicsTo make simple plots of an sf object, you can use R’s
base function plot():
plot(oregon_tann2)
Notice that it creates separate plots for each attribute. If you would prefer to plot the geometry itself, you have to say so explicitly.
plot(st_geometry(oregon_tann2))
ggplot2One of the easiest ways to improve on these base plots is to use
ggplot2. This contains a a special plotting geometry,
geom_sf, designed to work with sf objects.
Here, we’ll use a subset of polygons from the NY8 data to
illustrate how this works. (Note that geom_sf refers to a
ggplot2 geometry, not a sf geometry.)
binghamton <- subset(NY8, AREANAME == "Binghamton city")
We can now plot this by calling the ggplot() function
and adding the sf object with geom_sf:
ggplot() +
geom_sf(data = binghamton) +
theme_bw()
Multiple layers can be added to a plot by adding additional
geom_sf functions. Here, we first create a new
sf object containing Binghamton and it’s neighboring
polygons, then create some random points for plotting:
sf::sf_use_s2(FALSE)
bingies_neighbors <- st_filter(NY8, binghamton)
random_pnts <- st_sample(bingies_neighbors, size = 25)
random_pnts <- st_as_sf(random_pnts)
Now we can plot these: first the larger set of polygons, then Binghamton City and finally the points:
ggplot() +
geom_sf(data = bingies_neighbors) +
geom_sf(data = binghamton, fill = "blue") +
geom_sf(data = random_pnts, color = "darkgreen") +
theme_bw()
We can create thematic maps by specifying the name of a variable in
the geom_sf() function:
names(binghamton)
## [1] "AREANAME" "AREAKEY" "X" "Y" "POP8"
## [6] "TRACTCAS" "PROPCAS" "PCTOWNHOME" "PCTAGE65P" "Z"
## [11] "AVGIDIST" "PEXPOSURE" "Cases" "Xm" "Ym"
## [16] "Xshift" "Yshift" "geometry"
ggplot() +
geom_sf(data = binghamton, aes(fill = POP8)) +
theme_bw()
Here, we will use the viridis color scale, which is
colorblind safe. This comes with several color palette
options.
ggplot() +
geom_sf(data = binghamton, aes(fill = PEXPOSURE)) +
scale_fill_viridis(option = "viridis") +
theme_bw()
ggplot() +
geom_sf(data = binghamton, aes(fill = PEXPOSURE)) +
scale_fill_viridis(option = "magma") +
theme_bw()
By default, geom_sf transforms all sf
objects to WGS84 (or latitude and longitude), but you can change this
with coord_sf. This takes an argument datum
that can be used to specify a different projection. Here we plot the
Binghamton data using a UTM projection (zone 18N = EPSG code 26918).
Note the change in both the projection of the data, and the axes:
ggplot() +
geom_sf(data = binghamton, aes(fill = PEXPOSURE)) +
scale_fill_viridis(option = "viridis") +
coord_sf(datum = 26918) +
theme_bw()
You can also use this to zoom in on different parts of the map.
ggplot() +
geom_sf(data = binghamton, aes(fill = PEXPOSURE)) +
scale_fill_viridis(option = "viridis") +
coord_sf(xlim = c(-75.93, -75.88), ylim = c(42.09, 42.13)) +
theme_bw()
Up till now, we have been working with vector spatial data. These are geometries composed of points defined by their coordinates. An alternative form of spatial data is known as a raster. This is gridded data. It takes the form of a rectangle composed of squares of equal size, which are sometimes called ‘cells’ or ‘pixels’. Each cell stores some kind of value.
This simplifies the geometry, which can be specified by two pieces of
information: the spatial extent of the raster and the resolution of the
cells. Here we create a blank raster with 10 rows and 10 columns using
the rast() function, and assign random values to each
cell:
r <- rast(nrow = 10, ncol = 10)
r[] <- runif(n = 100)
r
## class : SpatRaster
## dimensions : 10, 10, 1 (nrow, ncol, nlyr)
## resolution : 36, 18 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : lyr.1
## min value : 0.01260905
## max value : 0.98447369
rast objects can be plotted using the base
plot() command:
plot(r)
The terra package offers a wide array of functions for dealing with gridded data, including the ability to read from many widely used file formats, like remote sensing images (e.g. GeoTiffs), NetCDF, and HDF formats. We will use it here to work with gridded air temperature data (air.mon.ltm.nc) from the NCAR NCEP reanalysis project. This is the long term means for each month from 1981-2010. The file has 12 layers (one per month) and one variable (air).
To read in gridded data, use the rast() function.
air_temp <- rast("./air.mon.ltm.nc")
## Warning: [rast] unknown extent
air_temp
## class : SpatRaster
## dimensions : 73, 144, 24 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : 0, 144, 0, 73 (xmin, xmax, ymin, ymax)
## coord. ref. :
## sources : air.mon.ltm.nc://air (12 layers)
## air.mon.ltm.nc://valid_yr_count (12 layers)
## varnames : air
## valid_yr_count
## names : air_1, air_2, air_3, air_4, air_5, air_6, ...
When we print the rast object created, the second line
of the output lists the dimensions of the data. Note that here, this has
73 rows, 144 columns and 24 layers (each representing a different
variable)
In this case, rast has not set the spatial extent of the
data correctly, so we can do that by hand. The NCAR data span from 0-360
longitude and -90 to 90 latitude:
set.ext(air_temp, value = c(0, 360, -90, 90))
ext(air_temp)
## SpatExtent : 0, 360, -90, 90 (xmin, xmax, ymin, ymax)
You can access any of the layers using R’s list notation
(e.g. rast[[1]]). To extract the first layer (here, January
air temperature):
jan_temp = air_temp[[1]]
jan_temp
## class : SpatRaster
## dimensions : 73, 144, 1 (nrow, ncol, nlyr)
## resolution : 2.5, 2.465753 (x, y)
## extent : 0, 360, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. :
## source : air.mon.ltm.nc://air
## varname : air
## name : air_1
We can write rast objects back to file using
writeRaster() (I’ll bet you never thought it would be
called that). Here we write out to a TIFF format. You can see the full
list of available formats for reading and writing by running the
writeFormats() function in the console.
writeRaster(jan_temp,
filename = "./jan_temp.tif",
overwrite = TRUE)
Set CRS:
crs(jan_temp) <- "EPSG:4326"
Again, this should not be used to change the CRS, only set it.
Get CRS:
crs(jan_temp, proj = TRUE)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
Also, note that crs is for
rastersrastobjects,st_crs` for vectors.
Transform CRS. Note that terra can also uses the PROJ4-string syntax. It’s a holdover from its original development:
weird_crs <- crs("+proj=tmerc +lat_0=0 +lon_0=15 +k=0.999923 +x_0=5500000 +y_0=0 +ellps=GRS80 +units=m +no_defs")
jan_temp_weird_crs <- project(jan_temp, weird_crs)
crs(jan_temp_weird_crs, proj = TRUE)
## [1] "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.999923 +x_0=5500000 +y_0=0 +ellps=GRS80 +units=m +no_defs"
jan_temp is a rast object and has
information about grid spacing, coordinates etc. Note that the
description of the object tells you whether it is held in memory (small
raster files) or on disk.
plot(jan_temp, main = "NCEP NCAR January LTM Tair")
And you should be able to see the outline of the continents. We can
convert this to the more commonly used (and UK-centric :) ) -180 to 180
by the function rotate(). We’ll also use a different color
palette and overlay country polygons. You can use sf
objects for this, or use the terra package own vector
format (vect). We’ll use that here, as it has some
advantages for spatial overlays. To read in a shapefile in this format,
use the vect function:
jan_temp = rotate(jan_temp)
# using RColorBrewer
my.pal <- brewer.pal(n = 9, name = "OrRd")
plot(jan_temp,
main = "NCEP NCAR January LTM Tair",
col = my.pal)
countries <- vect("./ne_50m_admin_0_countries/ne_50m_admin_0_countries.shp")
plot(countries, add = TRUE)
To see other color palettes in the RColorBrewer
package, run display.brewer.all() in the console. You can
find out more details and explore visualizations with these palettes here.
The function cellStats() can be used to calculate most
summary statistics for a raster layer. So to get the mean global
temperature (and standard deviation):
global(jan_temp, mean)
## mean
## air_1 3.546352
global(jan_temp, sd)
## sd
## air_1 19.63948
If we want to use only a subset of the original raster layer, the
function crop() will extract only the cells in a given
region. This can be defined using another raster object or Spatial*
object, or by defining an extent object:
# extent method
canada_ext <- ext(c(xmin = -142,
xmax = -52,
ymin = 41,
ymax = 84))
canada_air_temp <- crop(jan_temp, canada_ext)
If we extract the polygon corresponding to the outline of Canada, we can use that insteadL
canada <- countries[countries$NAME == "Canada", ]
canada_air_temp <- crop(jan_temp, canada)
# plot
plot(canada_air_temp, main = "NCEP NCAR January LTM Tair", col = my.pal)
plot(canada, add = TRUE)
Note that crop subsets the original raster to the extent
of Canada’s borders, rather than to the borders themselves. This is
because rasters are always rectangular. You can ‘hide’ the
values of raster cells outside of a polygon by using the
mask function. The raster has to be rectangular, so this
does not remove the cells outside the polygon. Rather, it sets their
value to NA.
canada_air_temp <- mask(canada_air_temp, mask = canada)
# plot
plot(canada_air_temp, main = "NCEP NCAR January LTM Tair", col = my.pal)
plot(canada, add = TRUE)
Values can be extracted from individual locations (or sets of
locations) using extract(). This can take a set of
coordinates in matrix form, or use a Spatial* object. To get the January
temperature of Salt Lake City:
extract(jan_temp, cbind(-111.9,40.76))
## air_1
## 1 -6.01876
We created a simple feature object earlier with the location of
samples in Western North America (wna_climate). We can now
use this, and the raster layer to get the January temperature for all
locations.
wna_air_temp_df <- extract(jan_temp,
wna_climate,
method = 'bilinear',
df = TRUE)
head(wna_air_temp_df)
## ID air_1
## 1 1 -9.826717
## 2 2 -18.532247
## 3 3 -3.518785
## 4 4 -7.644527
## 5 5 -19.814369
## 6 6 -13.629312
df = TRUE tells the function to return the extracted
values as a data.frame, which has two columns the raster cell ID and the
value in that cell.
This same approach allows you to extract pixels by polygon overlays.
china <- countries[countries$NAME == "China", ]
china_air_temp_df <- extract(jan_temp, china, df = TRUE)
head(china_air_temp_df)
## ID air_1
## 1 1 -23.21633
## 2 1 -23.51869
## 3 1 -22.07151
## 4 1 -20.61807
## 5 1 -23.53544
## 6 1 -24.09268
Note that the first column is an ID corresponding to the polygon that was used. When this function is used with a set of polygons, the output is in a list, but we can retrieve whatever we want from that list.
two_countries <- rbind(china, canada)
two_countries_tjan <- extract(jan_temp, two_countries)
head(two_countries_tjan)
## ID air_1
## 1 1 -23.21633
## 2 1 -23.51869
## 3 1 -22.07151
## 4 1 -20.61807
## 5 1 -23.53544
## 6 1 -24.09268
And we can make a quick plot with ggplot2 to show the two distributions of temperature:
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:terra':
##
## intersect, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
two_countries_tjan %>%
mutate(NAME = ifelse(ID == 1, "China", "Canada")) %>%
ggplot(aes(x = air_1, fill = NAME)) +
geom_histogram(position = 'identity', alpha = 0.75)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The extract() function also takes an argument
fun. This allows you to calculate a summary statistic for
each set of pixels that is extracted (i.e. one per polygon). Here, we’ll
use this with countries to get an average value of January
temperature. We add this back as a new column in the countries object,
and then plot it:
countries$Jan_Tmp <- extract(jan_temp, countries, fun = mean)[, 'air_1']
countries_sf <- st_as_sf(countries)
ggplot(countries_sf) +
geom_sf(aes(fill = Jan_Tmp)) +
scale_fill_viridis(option = "inferno") +
labs(fill = "Temperature",
title = "Country average January temperature") +
theme_bw()
A useful extension to the basic raster functions is the use of
stacks. These are a stack of raster layers which represent different
variables, but have the same spatial extent and resolution. There’s
nothing special that you need to do here - when we originally read in
the air temperature data, the rast object contained all 24
layers from the NetCDF file. This included 12 monthly layers and 12
layers indicating data quality. We’ll re-read this, but this time only
include the 12 monthly layers from the NetCDF file, and then work with
this. We read these in with rast(), specifying the layers
with lyrs=. We also set the extent and rotate the data.
air_temp <- rast("./air.mon.ltm.nc", lyrs = 1:12)
## Warning: [rast] unknown extent
crs(air_temp) <- "EPSG:4326"
set.ext(air_temp, value = c(0, 360, -90, 90))
air_temp = rotate(air_temp)
we can see that this has 12 layers, each with 280 cells and the extent, etc. The names attributed to each layer are often unreadable, so we can add our own names:
names(air_temp) <- paste("TAS", month.abb, sep = "_")
names(air_temp)
## [1] "TAS_Jan" "TAS_Feb" "TAS_Mar" "TAS_Apr" "TAS_May" "TAS_Jun" "TAS_Jul"
## [8] "TAS_Aug" "TAS_Sep" "TAS_Oct" "TAS_Nov" "TAS_Dec"
And now you can also pull out rasters by name:
# method 1
jan_temp <- air_temp$TAS_Jan
# method 2
jan_temp <- air_temp[["TAS_Jan"]]
A useful feature when working with stacks is that any of the
functions we used previously with one layer will be used across all
layers when applied to the stack. The plot() function, for
example, returns a grid with one plot per layer. Setting the
range argument ensures that all figures use the same range
of colors:
plot(air_temp,
col = my.pal,
type = 'continuous',
range = c(-40, 40))
Adding a shapefile (or other spatial information) is a little more
complex. We create a simple function to plot the country borders, then
include this as an addfun in the call to
plot():
addBorder = function(){ plot(as_Spatial(countries), add = TRUE) }
addBorder = function(){ lines(vect) }
plot(air_temp,
col = my.pal,
type = 'continuous',
breaks = seq(-35, 40, by = 5),
fun = function() lines(countries))
The cellStats() function now returns the mean (or other
statistic) for all layers, allowing a quick look at the seasonal cycle
of average air temperature.
tavg <- global(air_temp, mean)
plot(1:12, tavg$mean,
type = 'l',
xlab = "Month",
ylab = "Avg T (C)")
And we can do the same for an individual location using
extract(). We’ll convert this to a dataframe and plot using
ggplot2:
slc.tavg <- t(extract(air_temp, cbind(-111.9,40.76)))
plot_df <- data.frame(month = 1:12,
tavg = slc.tavg[, 1])
ggplot(plot_df, aes(x = month, y = tavg)) +
geom_line() +
scale_x_continuous("Month") +
scale_y_continuous("Deg C") +
ggtitle("SLC Temperatures (LTM)") +
theme_bw()
The exercise for the module will get you to make some simple maps with new datasets. You should submit the results as an R script with the commands you used, and a Word (or equivalent) with any results or figures or as an html page generated from an R Markdown document
In the zipfile countries.zip is a global shapefile with various socio-economic indicators for different countries. Load this file into R and make plots of any two of the following variables (the variable names are given in brackets). Try different color scales and transformations of the data to get the most informative maps
gdp_md_est)pop_est)income_grp)The NetCDF file air.mon.mean.nc contains air temperature data from the NCAR NCEP project, but as mean monthly temperature for every year between 1948 and the present. Using the functions introduced in this lab, read in the data and make a dataset with the globally averaged temperature for every layer in the file (i.e. from January 1948 to present). Use this data to make a line plots, either using base R or ggplot2